The Effect of Executing Mispredicted Load Instructions in a Speculative Multithreaded Architecture
نویسندگان
چکیده
Concurrent multithreaded architectures exploit both instructionlevel and thread-level parallelism in application programs. A single-threaded sequencing mechanism needs speculative execution beyond conditional branches in order to exploit more instruction-level parallelism. In addition, an aggressive multithreaded architecture should also use thread-level control speculation in order to exploit more thread-level parallelism. The instructionand thread-level speculative execution of load instructions in a multithreaded architecture system has a greater impact on the performance of the cache hierarchy as the design becomes more aggressive using wider issue processors and more thread units. In this study, we investigate the effects of executing the mispredicted load instructions on the cache performance of a scalable multithreaded computer system. The execution of loads down the wrongly predicted branch path within a thread unit or in a wrongly forked thread can result in an indirect prefetching effect for correct execution. This is possible even after the outcome of a control speculation is known. By allowing mispredicted load instructions to continue execution even after the instruction or thread level control speculation is known to have failed, we show that we can reduce the cache misses for the correctly predicted paths and threads. However, these additional loads also can increase the amount of memory traffic and can pollute the cache. Our results show that the performance of a concurrent multithreaded architecture can be improved as much as 14%, while reducing the number of L1 data cache misses up to 35%.
منابع مشابه
Using Incorrect Speculation to Prefetch Data in a Concurrent Multithreaded Processor
Concurrent multithreaded architectures exploit both instruction-level and thread-level parallelism through a combination of branch prediction and thread-level control speculation. The resulting speculative issuing of load instructions in these architectures can significantly impact the performance of the memory hierarchy as the system exploits higher degrees of parallelism. In this study, we in...
متن کاملReducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures
Trace-Level Speculative Multithreaded Processors exploit trace-level speculation by means of two threads working cooperatively. One thread, called the speculative thread, executes instructions ahead of the other by speculating on the result of several traces. The other thread executes speculated traces and verifies the speculation made by the first thread. Speculated traces are validated by ver...
متن کاملA Non-blocking Multithreaded Architecture with Support for Speculative Threads
In this paper we provide both a qualitative and a quantitative evaluation of a decoupled multithreaded architecture that uses non-blocking threads. Our architecture is based on simple in-order pipelines and complete decoupling of memory accesses from execution pipelines. We extend the architecture to support thread level speculation using snooping cache coherency protocols. We evaluate the perf...
متن کاملA Study of Mispredicted Branches Dependent on Load Misses in Continual Flow Pipelines
Large instruction window processors can achieve high performance by supplying more instructions during long latency load misses, thus effectively hiding these latencies. Continual Flow Pipeline (CFP) architectures provide high-performance by effectively increasing the number of actively executing instructions without increasing the size of the cycle-critical structures. A CFP consists of a Slic...
متن کاملSpeculative Precomputation
Current processors are based on a multithreaded architecture. Simultaneous Multithreading (SMT) techniques are used to increase instruction throughput under a multiprogramming workload; however, it does not improve performance when only a single thread is executing. This communication explores Speculative Precomputation, a technique that uses idle thread contexts in a multithreaded architecture...
متن کامل